feat(hunspell): Add ref_path support for package-based dictionary loa…#20840
Conversation
PR Code Analyzer ❗AI-powered 'Code-Diff-Analyzer' found issues on commit ba3f2c7. 'Diff too large, requires skip by maintainers after manual review' Pull Requests Author(s): Please update your Pull Request according to the report above. Repository Maintainer(s): You can Thanks. |
ba3f2c7 to
ea0b5f9
Compare
PR Code Analyzer ❗AI-powered 'Code-Diff-Analyzer' found issues on commit ea0b5f9. 'Diff too large, requires skip by maintainers after manual review' Pull Requests Author(s): Please update your Pull Request according to the report above. Repository Maintainer(s): You can Thanks. |
ea0b5f9 to
c77e078
Compare
PR Code Analyzer ❗AI-powered 'Code-Diff-Analyzer' found issues on commit c77e078. 'Diff too large, requires skip by maintainers after manual review' Pull Requests Author(s): Please update your Pull Request according to the report above. Repository Maintainer(s): You can Thanks. |
- Add GET /_hunspell/cache endpoint for viewing cached dictionary keys (cluster:monitor/hunspell/cache) - Add POST /_hunspell/cache/_invalidate endpoint for cache invalidation (cluster:admin/hunspell/cache/invalidate) - Support invalidation by package_id, locale, cache_key, or invalidate_all - Add TransportHunspellCacheInfoAction and TransportHunspellCacheInvalidateAction - Consistent response schema with all fields always present - Register actions and REST handler in ActionModule - Add comprehensive unit tests, REST handler tests, and integration test Depends on opensearch-project#20840 Signed-off-by: Ayush Sharma <118544643+shayush622@users.noreply.github.com> Signed-off-by: shayush622 <ayush5267@gmail.com>
There was a problem hiding this comment.
Pull request overview
This PR adds support for loading Hunspell dictionaries from package-scoped directories via a new ref_path parameter, enabling per-package dictionary isolation and cache invalidation APIs to support hot-reload workflows.
Changes:
- Add package-based dictionary loading in
HunspellServicewith{packageId}:{locale}cache keys and cache invalidation helpers. - Extend
HunspellTokenFilterFactoryto acceptref_path, validate identifiers, and supportupdateable: trueviaAnalysisMode.SEARCH_TIME. - Add/extend tests and test resources for package-based Hunspell dictionaries, and note the feature in the changelog.
Reviewed changes
Copilot reviewed 8 out of 9 changed files in this pull request and generated 5 comments.
Show a summary per file
| File | Description |
|---|---|
| server/src/main/java/org/opensearch/indices/analysis/HunspellService.java | Implements package-based dictionary loading + cache key utilities and invalidation methods. |
| server/src/main/java/org/opensearch/index/analysis/HunspellTokenFilterFactory.java | Adds ref_path parameter support, identifier validation, and hot-reload analysis mode handling. |
| server/src/main/java/org/opensearch/indices/analysis/AnalysisModule.java | Exposes HunspellService via a now-public getter for downstream wiring. |
| server/src/main/java/org/opensearch/node/Node.java | Wires HunspellService into node bootstrap path (currently introduces a constructor mismatch). |
| server/src/test/java/org/opensearch/indices/analyze/HunspellServiceTests.java | Adds unit tests for package-based loading, caching, and invalidation APIs. |
| server/src/test/java/org/opensearch/index/analysis/HunspellTokenFilterFactoryTests.java | Adds integration-style tests for ref_path behavior, validation, and updateable mode. |
| server/src/test/resources/indices/analyze/conf_dir/packages/test-pkg/hunspell/en_US/en_US.aff | Adds package-based hunspell test affix file under config/packages/.... |
| CHANGELOG.md | Adds a changelog entry for the new ref_path support. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
server/src/main/java/org/opensearch/index/analysis/HunspellTokenFilterFactory.java
Outdated
Show resolved
Hide resolved
| String prefix = packageId + CACHE_KEY_SEPARATOR; // Match keys like "pkg-1234:en_US" | ||
| int sizeBefore = dictionaries.size(); | ||
| dictionaries.keySet().removeIf(key -> key.startsWith(prefix)); | ||
| int count = sizeBefore - dictionaries.size(); | ||
|
|
||
| if (count > 0) { | ||
| logger.info("Invalidated {} hunspell dictionary cache entries for package: {}", count, packageId); | ||
| } else { | ||
| logger.debug("No cached dictionaries found for package: {}", packageId); | ||
| } |
There was a problem hiding this comment.
invalidateDictionariesByPackage() computes the invalidated count as sizeBefore - dictionaries.size(), but this can be incorrect in concurrent scenarios (entries can be added/removed while the method runs). Either document that the returned count is approximate (as done in invalidateAllDictionaries()), or compute the count based on the keys actually removed.
server/src/main/java/org/opensearch/indices/analysis/HunspellService.java
Show resolved
Hide resolved
|
Persistent review updated to latest commit 0fca446 |
|
❌ Gradle check result for 0fca446: FAILURE Please examine the workflow log, locate, and copy-paste the failure(s) below, then iterate to green. Is the failure a flaky test unrelated to your change? |
0fca446 to
1513ae8
Compare
|
Persistent review updated to latest commit 1513ae8 |
|
Persistent review updated to latest commit fc6800b |
fc6800b to
6e7ed83
Compare
|
Persistent review updated to latest commit 6e7ed83 |
6e7ed83 to
e23316e
Compare
|
Persistent review updated to latest commit e23316e |
|
❌ Gradle check result for e23316e: FAILURE Please examine the workflow log, locate, and copy-paste the failure(s) below, then iterate to green. Is the failure a flaky test unrelated to your change? |
e23316e to
4630c23
Compare
|
Persistent review updated to latest commit 4630c23 |
|
Persistent review updated to latest commit 6fd0caf |
server/src/main/java/org/opensearch/indices/analysis/HunspellService.java
Show resolved
Hide resolved
server/src/main/java/org/opensearch/indices/analysis/HunspellService.java
Show resolved
Hide resolved
server/src/main/java/org/opensearch/index/analysis/HunspellTokenFilterFactory.java
Show resolved
Hide resolved
server/src/main/java/org/opensearch/indices/analysis/HunspellService.java
Show resolved
Hide resolved
…ding
- Add ref_path parameter to HunspellTokenFilterFactory for package-based dictionaries
- Load from config/analyzers/{packageId}/hunspell/{locale}/
- Node-level cache with {packageId}:{locale} cache keys for multi-tenant isolation
- Refactor loadDictionary to accept baseDir parameter for code reuse
- Add regex allowlist validation for ref_path and locale
- Shared loadDictionaryFromDirectory for .aff/.dic file loading
- Backward compatible: traditional config/hunspell/{locale}
Signed-off-by: shayush622 <ayush5267@gmail.com>
6fd0caf to
d501490
Compare
|
Persistent review updated to latest commit d501490 |
Signed-off-by: Ayush Sharma <118544643+shayush622@users.noreply.github.com>
|
Persistent review updated to latest commit ef1e012 |
|
❕ Gradle check result for ef1e012: UNSTABLE Please review all flaky tests that succeeded after retry and create an issue if one does not already exist to track the flaky failure. |
…ding (opensearch-project#20840) - Add ref_path parameter to HunspellTokenFilterFactory for package-based dictionaries - Load from config/analyzers/{packageId}/hunspell/{locale}/ - Node-level cache with {packageId}:{locale} cache keys for multi-tenant isolation - Refactor loadDictionary to accept baseDir parameter for code reuse - Add regex allowlist validation for ref_path and locale - Shared loadDictionaryFromDirectory for .aff/.dic file loading - Backward compatible: traditional config/hunspell/{locale} Signed-off-by: shayush622 <ayush5267@gmail.com> Signed-off-by: Ayush Sharma <118544643+shayush622@users.noreply.github.com> Signed-off-by: kkewwei <kkewwei@163.com>
Description
This PR adds support for loading Hunspell dictionaries from package-based directories using a new
ref_pathparameter in the hunspell token filter. This enables multi-tenant dictionary isolation where each package can have its own hunspell dictionaries independent of the traditionalconfig/hunspell/location.Key changes:
ref_pathparameter: New optional parameter in the hunspell token filter that specifies a package ID. When used withlocale, dictionaries are loaded fromconfig/packages/{ref_path}/hunspell/{locale}/{packageId}:{locale}cache keys to avoid collisions with traditional locale-only keysinvalidateDictionary(),invalidateDictionariesByPackage(), andinvalidateAllDictionaries()methods to HunspellService for programmatic cache controlvalidatePackageIdentifier()updateable: trueflag enables_reload_search_analyzersAPI support for package-based dictionariesconfig/hunspell/{locale}/) continues to work unchangedUsage example:
{ "type": "hunspell", "ref_path": "pkg-1234", "locale": "en_US" }This is Part 1 of 2 — a follow-up PR will add REST API endpoints for cache info and invalidation. Split from #20792.
Related Issues
Resolves #20712
Related Issues
Resolves #[Issue number to be closed when this PR is merged]
Check List
By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.
For more information on following Developer Certificate of Origin and signing off your commits, please check here.